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Abstract 

A new method for Ewald summation in planar/slablike geometry, i.e. systems where 
periodicity applies in two dimensions and the last dimension is "free" (2P), is presented. 
We employ a spectral representation in terms of both Fourier series and integrals. This 
allows us to concisely derive both the 2P Ewald sum and a fast PME-type method suitable 
for large-scale computations. The primary results are: (i) close and illuminating connec- 
tions between the 2P problem and the standard Ewald sum and associated fast methods 
for full periodicity; (ii) a fast, 0(N log N), and spectrally accurate PME-type method for 
the 2P k-space Ewald sum that uses vastly less memory than traditional PME methods; 
(Hi) errors that decouple, such that parameter selection is simplified. We give analytical 
and numerical results to support this. 



1 Introduction 

Ewald summation deals with the task of summing the Coulomb potential over a set of charged 
particles that are subject to periodic boundary conditions. The potential sum itself may be 
written as 

N 

where {x.j,qj},j = 1...N, with x G 12 C M 3 and J^Qj = (neutrality) , represents the 
location and charge of N particles. For simplicity we let f2 = [0, L) 3 . Periodicity is expressed 
by a translation function / : A — > M 3 , and A = Z d denotes indices in a ci-dimensional lattice, 
d= 1,2,3. 

Complications, which depend on the dimension of periodicity, d, arises because the terms 
in (1) decay ~ 1/r. In fact, a certain amount of ambiguity surrounds direct summation of 
(1), see the appropriately named paper by Takemoto et. al. [52]. 

The present work deals with the accurate (spectrally) and efficient (N log N) computation 
of the potential sum under two-dimensional periodicity (the third dimension is "free"), i.e. 
when A = Z 2 , and /(p) = [pi,P2,0] (see Figure 1). We shall start by briefly surveying the 
fully periodic case. 
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Figure 1: 2P system: Unit cell f2 (black) repeated infinitely in the plane. 



1.1 3P: Fully extended periodicity 

In the most common situation, periodicity is extended in all three dimensions, i.e. that A = 1? 
and /(p) = Lp. This problem has been thoroughly studied, going back to the eponymous 
Ewald, who in 1921 [15] showed that (1) can be computed by splitting the sum into a rapidly 
decaying part and a smooth part which is summed in frequency domain, 
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where £ > (which 99 is independent of) is known as the Ewald parameter, k3 G {27rn/L : 
n G Z 3 }, = |k|, and the term (n = m, p = 0) is excluded from the real space sum. 

The utility of Ewald summation was greatly enhanced by the development of 0(N log N) 
methods, avoiding the severely limiting 0(iV 2 ) complexity of evaluating (2) for all x m . We 
denote by PME (Particle Mesh Ewald) the well known family of methods that derive from 
the pioneering P 3 M method by Hockney and Eastwood [25], including major developments 
such as the PME method due to Darden et. al. [7] and the SPME method by Essmann 
et. al. [14]. The consistency within the PME family is illustrated in the excellent surveys 
by Deserno and Holm [11], and Shan et. al. [49]. In a survey of electrostatic calculations 
in structural biology, an important application area, Koehl [33] points out that the success 
of Ewald's method overshadows other methods and that the applications community has 
flourished because of this. 



1.2 2P: Planar periodicity 

We shall denote the situation when periodicity applies in two dimensions and the third di- 
mension is free as planar periodicity or 2P, as illustrated in Figure 1. One may think of 
a sheet or lamina of charges confined by z G [0, L] and infinitely replicated in the (x, y)- 
plane. In the literature this situation is sometimes referred to as slab / slablike geometry or 
a quasi-two-dimensional system and enjoys a wealth of acronyms, such as 3D2P1F (i.e. a 
three-dimensional system, with two periodic directions, and one free). 
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As noted, a satisfactory way to sum the 3P problem came about in the 1920's and work 
on fast methods took off in the 1990's, based on the Ewald sum (2). In contrast, analysis 
and methods for the 2P problem lagged quite far behind, and fast methods have yet to reach 
the maturity of their 3P cousins. A summation formula analogous to (2) has emerged, but 
fundamentally different (i.e. non-Ewald) ideas are also being pursued. 

This result, which we shall refer to as the 2P Ewald sum, was derived by Grzybowski, 
Gwozdz and Brodka in [20] using lattice sums. Here, the potential sum 

N 

p(x) = EE n ^ — =nr, (3) 

iS-f^ ||x-X n + P ||' 

with p = L\pi,p2, 0], is shown to equal 
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where r e M 2 is the (x, y)-component of x, and k 6 {27rn/L : n G Z 2 }. 

Their approach follows a classical derivation of the 3P Ewald sum by de Leeuw et. al. [10] . 
Interestingly, and as Grzybowski et. al. point out, the exact same expression can be obtained 
from much earlier work by Bertaut [4] and, more recently, by Heyes et. al. [23, 21, 24, 22]. 
However, it is also attributed to de Leeuw and Perram [9] by other authors (e.g. [32]). 
Another group with a strong claim of independently developing the 2P Ewald sum is Rhee 
et. al. [46]. Among the foremost in early developments was Parry [42, 43], whose results are 
drawn upon by Heyes and others. In the context of the present work, it is appropriate to 
highlight Grzybowski et. al. [20] as a modern and accessible reference. Irrespective of how 
one traces the lineage of (4), evaluating it for all x m has the dreaded 0{N 2 ) complexity (with 
a very large constant) without hinting how a fast method might arise. 

There are several alternatives to the 2P Ewald sum (4). An interesting non-Ewald method 
is known as Lekner summation, due to J. Lekner [35, 36], which obtains series that converge 
faster than the Ewald sum. The reader is referred to the excellent survey by Mazars [39] 
for more details, including comparisons between Lekner and Ewald sums, and appropriate 
caveats. Arnold and Holm [3] suggest a "convergence factor" approach - obtaining a non- 
Ewald method that goes by the name MMM2D, and is related to the Lekner sum. They, for 
the first time, show a priori error bounds for the 2P problem. 

Another important alternative to (4) is, somewhat brazenly, to use the 3P Ewald sum (2) 
instead. The idea here is to extend the unit cell in the z-direction, creating a gap that separates 
sheets of charged particles (periodicity in all three directions is implied). Convergence is 
expected because the artificial sheets have no net charge. This was investigated by Spohr 
[50], where it is indicated, computationally for a simple system, that (2) converges to (4) as 
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the gap widens. Various methods have been proposed that introduce correction terms to the 
3P sum, such as the method due to Yeh and Berkowitz [54] (see also Crozier et. al. [6]). 
Present in these references (and in the works cited therein) are, to a varying extent, additional 
assumptions and physically motivated simplifications that do not immediately generalize. The 
level of accuracy attained by these methods is seen as inadequate by present standards. 

Moreover, the errors introduced by extending the problem to 3D periodicity turn out 
to be quite subtle. In a pair of papers [2, 8], Arnold, Holm and de Joannis show that by 
formally summing in a planar fashion (rather then spherically, as is implied in the Ewald sum 
(2)) additional terms emerge. This lets them formulate a correction term which enables high 
accuracy and good error control when used in conjunction with their MMM2D method [3]. 
They also apply established PME methods to the extended problem, obtaining a fast method. 
The work by Holm et. al. deserves much credit for clarity, appropriate rigor, and a level of 
general applicability which is lacking in much of the preceding work. 

There also exists methods that aim to improve the efficiency of evaluating the 2P Ewald 
sum (4). In a collection of papers Kawata and collaborators [29, 28, 30] propose a method 
which relies on an integral transform that was also used by Parry [42, 43] (see clarifying 
correspondence [38, 31]). The same authors have also proposed a SPME-like method [32] 
that relies on the same ideas. However, even the determined reader may struggle to gain 
clarity from these sources - and the practical accuracy of their methods appears to be low 
and hard to control. This is regrettable, as we believe that their basic premises are quite 
useful. This shall be elaborated on throughout the present work. 

Recent work includes Ewald- related methods due to S. Goedecker and collaborators, such 
as the mixed Ewald-finite element method by Ghasemi et. al. [18] and related work [17, 41]. 

Of the available methods, the work by Holm et. al. appears to be the one most widely 
used. This might well be a consequence of their proximity to established 3P methods - 
which would explain why other recent work enjoys less attention. For instance, the idea that 
Ghasemi et. al. purse [18] (using a tailored finite element method in the z-direction) adds 
significantly to the mathematical and practical complexity of the problem at hand vis-a-vis 
3P methods. 

We agree with the view promoted by Holm et. al., that there is much value in having 
methods for the 2P problem that maintain a close relationship to the mature 3P methods. 
However, we believe that extending the problem to full periodicity, and then laboring exten- 
sively over correction terms to compensate, is a somewhat blunt approach. 

In the present work, we shall use a more subtle approach that avoids the extension to full 
periodicity, yet is consistent with the 3P Ewald framework. It starts from representing func- 
tions with planar periodicity (2P) using both Fourier series (in the periodic (x, y)-directions) 
and a Fourier integral (in the free z-direction) . We show that this admits a natural deriva- 
tion of the 2P Ewald sum (4). Furthermore, we shall see that an intermediate step in this 
derivation is a natural starting point from which a fast, O(NlogN), and spectrally accurate 
method can be developed. We derive this method and motivate it theoretically and with 
computational examples. 

2 Ewald summation in planar periodicity 
2.1 Preliminaries 

Start by defining a set of functions of mixed periodicity (2D periodic ID free). 
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Definition 2.1 (2P functions). Let Vh denote the set of functions f(x,y,z) that are periodic 
in (x, y) £ £1 and "free" in zeR. Denote Vq 3 /(r, z),r = (x, y) £ ft. We shall refer to these 
as functions of mixed periodicity. Functions in Vq have a discrete spectrum corresponding to 
the periodic directions, and a continuous spectrum corresponding to z. We let Vq B /(r, z) ^ 
/(k, k), k € {2vrn/L : n G Z 2 }, k£R. 

We assume that /(r, z) and all it's derivatives decay faster than any inverse power of z 
in the limit \z\ — >■ oo, and that / n |/(r, z)| 2 dr < co for all z. Then / exists and represents 



In fact, in the present work we mostly deal with Gaussians, e~ ar , i.e. the fixed point of the 
Fourier transform (a Schwartz function). As far as spectral properties are concerned, this is 
a very strong setting. 

We shall need several fundamental results from Fourier analysis, including Poisson sum- 
mation, Parseval/Plancherel's formula and the convolution theorem. Typically, these results 
are given for either free-space or periodic functions, see e.g. Pinsky [44, Ch. 4] and Vretblad 
[53, pp. 175-181]. For functions in Vq we have the following: 

Lemma 2.1 (Poisson summation). Let /(x) G Vq have Fourier transform f, and let 



f £V n (with ft = [0,L) 2 ): 




(5) 



.2 



Then. 




where p = [Lp, 0] and x =: (r, z). 



Lemma 2.2 (Parseval/Plancherel). Let /(x) 6 Vh have Fourier transform f. Then, 




Lemma 2.3 (Parseval/Plancherel variant). Let /(x),g(x) G V^ have Fourier transform f 
and g respectively. Then, 




f(r,z)g(r,z)drdz = — / ^ /(k, n)g(k, n)dn. 




Lemma 2.4 (Convolution). The convolution of /(x), <?(x) G Vh is defined as 



(f*g)(r,z)= j j f(r-T>,z-z>)g(r',z , )dr , dz', 



and satisfies 



h(r,z) = (f * g)(r,z) h(k, k) = f(k, n)g(k, k). 
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2.2 Deriving the 2P Ewald sum 



From these definitions and properties we now derive the 2P Ewald sum (4) in a way that 
naturally sets the stage for our PME-type method (Section 3). The objective is to compute 

N 

p(x) = V V Tl ^— , 

p« 2 n=lH X - X « + Pll 

where p = L[p, 0], p G Z 2 . The traditional way to derive the (3P) Ewald sum is to solve the 
Poisson problem, 



N 



-A<p(x) = 4tt ]T P n (x) = 9n<*(x - x„ + p) 



n=l 

by introducing a charge screening function, 7(£,x), 

p"(x) = p"(x) - (p n * 7 )(x) + (p n * 7 )(x) . (6) 

s v ' s v ' 

:=p™.«(x) :=p". F (x) 

One then builds <p from 

AT 

= +¥> n>F ) (7) 

n=l 

after solving 

-A^ n > R (x) = 4vrp n ' R (x) and -A^'^x) = 4^ n ' F (x) . 



(a) (6) 

The screening function, 7, is required to go from 7(£, 0) = 1 to 7(£, ||x|| — >■ 00) = with suffi- 
cient regularity and be normalized ||7(£, x) || L 2 = 1. The most common choice is a Gaussian, 

7 (x) = eV-3/2 e -ax|P _ 7 (k) = e- fc2 / 4 € 2 , (8) 

and the classical Ewald summation result follows from this. Utilizing this screening function 
also in the 2P setting, it's a straight forward computation to solve (a), as it is essentially the 
same as in 3P. One arrives at: 

nRr \ erfc(£||x - x n + p||) 

f ' x = ^ q n p " 7 , x / x n . 

p ^ 2 ||x-x n + p|| 

In the limit x — > x„ we wish to remove the self-interaction, which, under the screening, 7, 
has partly been incorporated into (b), 

Hm ( eM£M) _ J_\ = ^ erf(e||x||) 2£ 



— >0 V ||x|| ||x|| J ||3c|| — >0 ||x 
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Summing, in light of (7), gives J2n=i f n,R {^ m ) = f R {^m) + ^ S (x m ), with 



R/ \ R \- \- „ erfc(g||x m -x TO + p|| 

\/7T 



where * denotes that the term p = is excluded when n = m. The last term, ip s , is usually 
referred to as self interaction. 

The second equation (b) is also treated along the lines of the classical derivation of the 3P 
Ewald sum, though mixed periodicity will play a bigger role here. Additionally, physically 
motivated conditions as z — > ±oo, consistent with the charge distribution, have to be satisfied. 
Returning to (5), let 

<p n ' F (r,z) = 7^ J ^^ n ' F (k, K )e ik - (r ~ r " ) e iK(z - 2 " ) d K . 

Differentiation gives that 

-A<p n > F = — I Y(k 2 + K 2 W l ' F (k, K )e ik - r e iKZ dK. (9) 
27T Jm. k 

On the other hand, using Poisson summation (Lemma 2.1) we get 



4Trp n ' F = 4irJ2 9«7( x " x « + P) = 7T / S ^)e ik - (r ^ r " ) e iK(2 - z " ) dK 

/ Ve-^+^^^e^^-'^e^-^^K. (10) 



P ez 2 ^ k 

L 2 



k 
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Equating (9) and (10) gives, for k + k > 0, 

, F 47rg n e-( fc2 +« 2 )/< 



V 7 



L 2 fc 2 + k 2 ' 
so that 

^,F { ^ z) = ^h A ^ el k.(r-r„) e «(,- Z „) dK + (/? n,F,k=0 > 

Up to this point the 2P Ewald derivation has deviated from the traditional 3P derivation only 
in the representation formula (5). However, there remains to discuss the k = term. We 
write 



Ft \ 1 \ \ ^ / e i 
V (X„ t ) = -2 ^ ^ / gn , 2 2 e 

^ n=lk ? 40' /R ^ +K 

so that J2n=i ^ n ' F ( r m, Zm) = f F {r m , z m ) + V3 F,k=0 . 
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Before determining the term (p F,]i °, one can proceed further with the integral in (11). 
Using Erdelyi (ed.) [13, Ch. 1.4, (15), p. 15], or more the more recent Zwillinger (ed.) [55, 
3.954 (2), p. 504], it follows that 



TV 



— g^k-(r m r n ) 



n=l k^O 



k 



k{z m -z n ) eiic (k + _ A + 



+ e -k(Zm-Z n ) evic ^_^ Zm _ Zn ^ 



Turning to the 2P-specific contribution denoted <^ k=0 - in the 3P Ewald sum (2), the 
(single) term k3 = is simply dropped due to the condition that (p integrates to zero, which 
is consistent with the charge neutrality constraint, J2n=i In = 0. In the 2-periodic setting 
the relevant condition takes the form of a dipole moment with respect to z, the non-periodic 
direction, 



2vr 



N 



n=l 



The derivation is found in Appendix A, where the remaining contribution is found to be: 



frrl 



2^F 
' L 2 



N (\ \ 

In ( - e ~ e ^ Zm - Zn)2 + ^{z m - z n )eri(£(z m - z n )) J 
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We now have all the terms present in (4) and the derivation is complete. To summarize, 
ip is computed from 



where 



N * 



n=l pel, 2 

N 

n=lk^0 JR 

N 



<p(yL m ) = ip* + ip* + ^' k -° + <p 
erfc(£|x m -x n + p|) 



s 

mi 



F 



2 



-(k 2 +K 2 )/4e 



k 2 + K 2 



(12) 
(13) 



n=l k^O 
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k 



A: 



e k(z m -z n ) ej . ic ( + ^ _ ^ I + 



A; 



+ e -^ m -^) erfc (__ e(zm _ Zn ; 



C^ k=0 



n=l 



(14) 
(15) 
(16) 



We shall refer to (12) as the 2P real space Ewald sum, to (14) as the 2P k-space Ewald 
sum. As we have already pointed out, these expressions have been derived before, e.g. by 



S 2 £ 

\/7T 
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Grzybowski et. al. [20]. However, they arrive at (14) in a completely different manner. For 
us, the integral representation of tp F (13) is the key result that we shall derive a fast and 
accurate PME-type method from. Kawata and Mikami [28] view (13) as a consequence of 
(14), which is of course valid (the expressions are equivalent), but runs counter to intuition. 
The theoretical foundations set forth in Section 2.1 not only enable our elementary derivation 
of (12)-(16) and the important choice (13) V (14), but are also required as we proceed. 

As is well established for the 3P Ewald sum, the infinite sums above may be truncated 
(which we elaborate on in Section 3.5.3). Evaluating (12) or (14) Vm G {1,2, ... ,N} has 
complexity 0(N 2 ) with a very large constant (that grows geometrically as higher accuracy 
is required). The contribution from the k = singularity (15) has the same complexity, but 
with a smaller constant. 



3 Spectrally accurate fast method for the 2P Ewald sum 

Here we develop a PME-like method with spectral accuracy to compute the reciprocal space 
2P Ewald sums (p F (14) and ip F,]l=0 (15). The treatment is self-contained, but the reader may 
benefit from being familiar with our previous paper [37] on the 3P problem. 

3.1 Fast method for (p F 

Consider the computation of the integral form of the 2P k-space Ewald sum (13): 

2 r e -(fc 2 +*W * 

hJ^ k2+K2 k 



2 , (k 2 +K 2 )/4e N 



We proceed as in [37], splitting the Gaussian term above into three parts using a parameter 
7] > (cf. Section 3.1.1), 



9 r p-^-vX^+k 2 )/^- 



where we have let 

N 



n=l 



Using the convolution theorem (Lemma 2.4) and known transforms one finds 



N 

H(r, z) = CJ2 q n e-^ r ~ Tnll *e-^ z - z ^ 2 , C = (2^ 2 /^) 3/2 , P = (17) 

n=l 

where || • ||» denotes that periodicity is implied (in the (x, y)-plane, nota bene). This expression 
can be efficiently evaluated on a grid, but to obtain H(k, k) on a suitable grid in k-space we 
must be careful. The computation in the periodic directions is simple - just take the FFT - 
but in the z-direction one needs to compute the Fourier integral. We refer to this operation 
as a mixed Fourier transform, MFT(-), i.e. 

H(k, k) = MFT(#(r, z)), (18) 
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and clarify this in Section 3.2. Moving on, let 



~ e -(i-^)(fc 2 +« 2 )M 2 ^ 

H(k, K ) := H(k, K ), (19) 



so that 



<p F (r m ,z m ) = A V / e- ik - rm e- iKZm e^ k2+K2 y^ 2 H(k, K )dK. (20) 



2 

To proceed we use PlancherePs Theorem (Lemma 2.3) with 

/(k,K) = e -^m e -iKZ me -ri{k->+K*)/&e 



and g = H(k,K). Noting that / is a k-space product between Gaussians and complex 
exponentials, one may compute it's inversion, /(r, convolution with 5- functions. 

Thus, invoking Lemma 2.4 and known transforms gives that 



V?(x m ) = % I I H(r,z) 



L 2 
1? 



C S(r' - r m )6(z' - z m )e-^ r '- r ^e-^ z '- z ^ 2 dr'dz' 



n 

2 



drdz 



r Jn 



H(r,z)Ce-^ r - r ^*e-^ z - z ^ drdz. (21) 



Again, H(k, k) — > H(r, z) on a grid in real space requires a non-trivial mixed transform 
(again, see Section 3.2), i.e. H(r,z) = MFT -1 (£T(k, k)). The integral (21) is evaluated using 
trapezoidal quadrature (to spectral accuracy). In contrast to other PME-type methods, the 
final result (21) is an equality - no approximations have yet been introduced. Naturally, as 
the integral is evaluated via quadrature, and a finite grid for x is employed, approximations 
will enter. We shall see that these errors are well controlled. 

To summarize, the algorithm is: (i) compute (17), (ii) take the mixed transform (18), 
(in) compute (19), (iv) take the mixed inverse transform, and finally (v) compute (21) at all 
desired points x m . 

The key point that we wish to convey with the derivations of Sections 2.2 and 3.1 is the 
minimal deviation from the treatment of 3P Ewald methods. In particular, the fast method 
presented here is equivalent to established 3P PME methods with the following exceptions: 
Gaussians (rather than e.g. Cardinal B-splines) are used in the charge assignment step (17); 
an integral is evaluated (rather than interpolation) to get point-values back, in (21); and a 
Fourier integral replaces the DFT in the z-direction of the transforms. 

3.1.1 Parameterization and modus operandi 

There is a free parameter, rj, that can be used to control the shape of the Gaussian used for 
the convolutions in (17) and (21). We find (cf. [37]) that a natural choice is to let 

2wC 2 



V= — , (22) 
V m J 

where w represents the half width of a Gaussian and m it's shape - see Figure 2. Let the 
domain VL = [0, L) 3 be discretized with M points in each direction and let h = L/M denote 
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h 2h 3h 




Figure 2: Top: Gaussians with different shape parameters, m± < ni2 < mz- Bottom: Gaus- 
sian with support on P grid points around Xj 

the grid size. It is implied throughout that L and M can be different in each direction, and 
subscripts will be used when necessary, i.e. L z and M z . 

Gaussians lack compact support, but they are highly localized. It is natural to truncate 
them, as is done in the non-uniform FFT [19, 12]. We let P < M denote the number of grid 
points within the support of each Gaussian, as seen in Figure 2 (bottom). This naturally 
implies that we take w = hP/2. Furthermore, our analysis shows (cf. Section 3.5) that the 
shape parameter, m, can be chosen asm~ \[P. This leaves us with a single parameter, P. 

Remark 3.1. In contrast to traditional PME methods, we consider the grid size, M, fixed 
(determined by the truncation estimates of the Ewald sum). The approximation errors added 
by the fast method are controlled by increasing P, the number of points within the support of 
each Gaussian. 

3.2 Computation of Fourier integrals via FFT 

We now make important clarifications regarding the non-trivial mixed transforms present 
in the fast method above. Recall that we have two transforms to compute: (i) from the 

gridded charge distribution H(r, z) to H(k, k), and (ii) from H(k, k) to the real-space function 
H(r, z). Again, H £Vq, and we think of this mixed periodicity in the following way: H(r, z) 
is periodic in r = (x, y) and free in z - hence, H has a transform, -f^(k, k), where k is discrete 
in the sense that k E 27rZ 2 /L, and k E K is a continuous transform variable corresponding to 
the non-periodic dimension. 

Remark 3.2. Regardless of how these transforms are computed in practice, it is important to 
remember the underlying mathematics: a 2D discrete Fourier transform in (x,y), together with 
a Fourier integral transform in z. The discrete transforms are, of course, computed accurately 
via the FFT, whereas the integral transform raises the specter of very large numerical errors. 
This is one of the defining differences between Ewald methods in 2P and 3P. 
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Furthermore, we require that the integral transforms can be computed in the same, 
0(M 2 logM 2 ), complexity as the corresponding DFT in the 3P method, so as to stay rel- 
evant for large-scale calculations. 

We now outline how this is done, based on remarks by Press et. al. [45, pp. 692-693]. Let 
f(x) G C°° decay sufficiently fast in the the interval (a, b) that it's Fourier integral transform 
can be truncated 

L,b(k) := f f(x)exp(ikx)dx, \f(k) - f afi (k)\ < e, (23) 

J a 

for some small e. Discretize the interval in real space with M subintervals of size h = 
(b — a)/M. We approximate the integral in a midpoint fashion, 

M 

fa,b{k) ~ TM{k) -=h^2 f{xj) exp(ikxj), with Xj = h(j — 1) + a + h/2 = hj + a — h/2, 
j'=i 

M 

=hexp(ik(a - h/2)) f(xj) exp(ikhj). (24) 

3=1 

Suppose we want to evaluate f(k) on a reciprocal grid ^{— M/2, . . . , M/2} 3 k. Then the 
remaining sum can be identified with the discrete Fourier transform, so that f(k) is obtained 
on the entire reciprocal grid by a single FFT. 

The task of computing Fourier integral transforms numerically is well studied in a broader 
context - for instance there are the famous Filon-type quadratures (named after L. N. G. Filon 
who worked on predecessors to current methods in the 1920's [16]). A modern starting point 
is Iserles and Norsett [26, 27], where matched asymptotic expansions are used to formulate 
accurate numerical methods for highly oscillatory integrals. They aim to compute f a ,b(k) for 
a single (very large) k with relatively few evaluations of /, under much weaker assumptions 
on / than we have here. For the present calculations, we may view the integral as moderately 
oscillatory. By this we mean that / falls off fast enough that the maximal characteristic 
frequency, {b — a)k\ ~ M, of interest is modest even for very high accuracies (as shall become 
clear soon). Furthermore, we need to compute the Fourier integral on all points k on a 
reciprocal space grid, and Iserles [26, p. 367] indicates that an FFT-based method is the 
appropriate choice (cf. earlier work by Narasimhan [40]). 

That said, we return to the midpoint quadrature (24). Press et. al. offer appropriate 
caution over this quadrature method: Again, the integral (23) is oscillatory for large k, and, 
since the maximal k is proportional to M, it is not obvious in what sense Tm converges to 
fa,b(k) as M grows. Of course, the corresponding inverse Fourier transform may be treated 
similarly, and the same caution applies. To investigate the numerical errors involved we now 
consider two carefully chosen ID integrals. 

3.2.1 Fourier integral transform of Gaussian via FFT 

In light of the computations relevant to the present work, we first restrict ourselves to a 
Gaussian and its transform, 

f(x)=exp(-(3\x-x')) ^ f(k) = ^Pexp(-k 2 /(^ 2 ))exp(-ikx'). (25) 
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Figure 3: Left: Convergence of quadrature method (24) for computing Fourier transform of 
Gaussian in ID (25). Parameters, /3 = 9, x' = 0.51, a = —0.1, b = 1.1, chosen to avoid the 
symmetric cases when accuracy comes easier. Right: In the modus operandi of the proposed 
method - convergence of quadrature method, as Gaussian support increases, with M = 110 
(fixed), £ = 8, a = -0.1, b = 1.1, x' = 0.51, m = 8. 



In Figure 3 (left) we numerically demonstrate spectral convergence (24), ||Tjvf(fc) — f(k)\\oo ~ 
exp(— c(/3)M 2 ) for some c > that naturally depends on (3. 

We can go further with the ID example and introduce the parameterization, j3 = 2^ 2 /r] in 
the transform pair (25), and modus operandi of the Gaussians in the fast method above. That 
is, let the domain x 6 (a, b) be discretized with M points, and let a Gaussian have support 
on P points around x = x' . Considering M fixed and increasing P, the discrete support of 
the truncated Gaussians, gives the convergence results in Figure 3 (right). Numerically, we 
have demonstrated that: 

Remark 3.3. The trivial ID quadrature method (24) for the Fourier integral transform (25) 
(with (3 = 2i 2 lr}) converges, \\Tm p m (k) — /(fc)||oo ~ exp(— c{m)P 2 ), independently of M and 

Hence, the quadrature required to get into frequency domain has the important charac- 
teristic that approximation errors are controlled by P, the resolution of Gaussians, alone (so 
that the grid size can be determined by a truncation estimate for the underlying Ewald sum) . 
This numerical result is well supported by the analytical error analysis of our 3P fast Ewald 
method, cf. [37]. In essence - a Gaussian decays fast enough that no numerical difficulties 
arise from the oscillatory nature of the Fourier integral, so the trivial quadrature converges 
to machine precision with no need for e.g. oversampling. 



3.2.2 Inverse Fourier integral transform of non-Gaussian via FFT 

Having concluded that the mixed transform of the gridded charge distribution (17) into 
reciprocal space poses no particular numerical challenge, we turn to the relevant inverse 
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Figure 4: Inverse transform (26). Left: Illustration of non-converging errors when no over- 
sampling (sf = 1) is used. Right: Convergence of FFT-based quadrature from (25), up to a 
normalization, to the exact transform (26), £ = 8. 



transform. With (19) in mind we consider 

/(«) = ^2 k = 2n/L, 

and the inverse transform 

/(*) = ^ / /Me^d/c. (26) 

This integral does not offer an obvious closed form as in the previous example. None the 
less, for rj < 1, the integrand is smooth and integrable on M. In this particular section we 
employ arbitrary-precision integrators from Mathematica 7, so that the FFT-based quadrature 
method can be evaluated down to the regime of machine precision. 

The quadrature method for the inverse transform is, up to a normalization, identical 
to the approximate forward transform (24). Again, we associate the reciprocal space grid 
^-{—M/2, . . . , M/2} 3 k with a uniform staggered grid on the real-space interval [a, b}. How- 
ever, we find that this reciprocal grid is too coarse. Instead we consider the family of over- 
sampled grids, Ak{-s/M/2, . . . ,s f M/2}, s f € Z+, Ak = 2ir/(sfL). Evidently, with s f = 1 
there is no oversampling. 

In Figure 4 we present numerical result for a sequence of grids. Note that without over- 
sampling there are very visible artefacts of periodicity visible and no convergence. It is evident 
that this transform is significantly harder to compute than the transform of the pure Gaussian 
(cf. Section 3.2.1). That said - oversampling the FFTs by a small factor (up to six times for 
double precision accuracy) is well within the realm of practicality, as we shall return to. It 
is worth emphasizing that "oversampling" as we defined it here can be given various other 
equivalent meanings and monikers, such as "zero padding" and "s points per wavelength". 



3.2.3 Extension to functions in Vq 

The results from the ID analyses generalize to the relevant 3D (or, rather, 2P) transforms 
directly. That is, the transform H (r, z) — > H(\i, k) is computed via a 3D FFT, where the 
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"third" FFT is thought of as a quadrature operation (and appropriate pre-factors enter). 
We let T 2P denote the immediate extension of the ID quadrature scheme (24) on a grid 
M := [M xy , M xy , M z ]. One may again pose this computation in terms of the "fixed-grid, 
variable Gaussian support "-setting. To no surprise, one immediately 1 finds that 

\\T^ Pm (k, K ) - H(\t, k)\\oo ~ exp(-c(m)P 2 ), 

independently of M and £. Again, this has theoretical justification in the error estimates of 
the 3P method [37]. The inverse transform H(k,n) — > H(r,z), requires over-sampling by at 
least Sf = 2 in the third dimension, as in the ID example. 

The inpatient reader may wonder why we have not precisely defined the quadrature meth- 
ods in 2P including the pre-factors for both the forward and inverse transforms. The reason 
is that this somewhat laborious exercise in notation is not needed - between the forward 
and inverse transforms in the fast Ewald method, only a multiplication (19) occurs, so the 
pre-factors cancel by linearity. This suggests that we are back to "just 3D FFTs" as in pure 
3P Ewald methods. Recall, though, that we are still in the 2P setting, computing (12) - 
(16) including the k-singular contribution (15) which we discuss in Section 3.4. Additionally, 
we contend in the next section that an FFT-based quadrature method is efficient only when 
the underlying grid function is C°° smooth. In particular, adapting traditional PME-type 
methods to 2P, as in [28] , leads to much greater numerical challenges. 



3.2.4 Why not Cardinal B-splines and SPME? 

The reader who is familiar with fast Ewald methods may wonder what role the charge- 
assignment scheme (17) plays for the computation of the mixed transform. We use Gaussians, 
e -a(x-x') ^ k u j. j g ky nQ means on \y choice. The Smooth Particle Mesh Ewald (SPME) 
method [14], for instance, uses Cardinal B-splines in the corresponding step. These are given 
by 

W = (^1)7 B" 1 )* k \(p- k)\ {u " {x)+ := max(x ' 0) = {x + N)/2 ' (27) 

where p is the order of the spline, and have a known Fourier transform: 

m - fcl±^!. ,28) 

Indeed, Kawata and Mikami [28] propose a SPME-like method for the 2P case that uses this 
charge assignment function and their method is also based on the sum/integral (13). Thus, 
we ask how the the simple quadrature method (24) applied to (27) converges to (28). We note 
that F p £ C p and F p ~ k~ p . In Figure 5 we illustrate the expected convergence, M~ p , as the 
number of grid points in the quadrature method (24) grows. The contrast to the convergence 
in the Gaussian case, Figure 3, demands more than passing notice. 

The loss of regularity by going from Gaussians to Cardinal B-splines is significant and it 
lies at the heart of our argument. An FFT-based quadrature method (as used here and in 

1 Explicit numerical results for the Fourier integrals of 2P functions, analogous to Sections 3.2.1 and 3.2.2 
are omitted for brevity and concern over repetition. The propositions of spectral accuracy of the quadrature 
in the mixed transforms (and the need to oversample the inverse transform) are supported by the numerical 
evaluation of the complete fast method, cf. Section 4.1. 
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Figure 5: For Cardinal B-spline (27), convergence of FFT-based quadrature (24) to exact 
Fourier transform (28). As expected, the error behaves as M~ p (dashed line), where p is the 
order of the B-spline . Here, p = 5. 



[28]) loses a lot of accuracy as Gaussians are replace with B-splines in the integrand. The 
results given here indicate that hundreds of grid-points will be needed in the z-direction 
to compute the forward transform (analogous to the step H(r,z) —> H(\s.,k)) with decent 
accuracy. Additionally, we saw in Section 3.2.2 that the mixed inverse transform (in our case 
H(k, k) — > H{r } z)) is the main numerical challenge. We contend that it will be doubly so if 
Cardinal B-splines, or any other charge-assignment scheme from 3P PME methods, are used. 
The grid sizes and oversampling factors seen in [29, 28] seem to support this position. 



3.3 Fast gridding 

The expressions (17) and (21) involve computing N exponential functions for each point x on 
the grid. If the grid has M 3 points this naively suggests NM 3 evaluations of exp(-), which 
drops to NP 3 with the truncation from Section 3.1.1. This, as it turns out, is still many more 
than are needed if one uses the Gaussian gridding approach of Greengard and Lee [19]. 
The grid-representation of our source distribution (17), is a sum on the form 



3/2 



iV 



F(x)=(~) Y,1ne~ allx - Xnl12 . (29) 



n=l 



For clarity here we shall suppose that the Gaussians are not truncated. The key observation 
is that we wish to evaluate -ff(x) on an equidistant grid, i.e. x = [ih,jh,kh], where (i,j,k) 
are integer index triplets in the range 0, 1, . . . , M — 1. To see how we can reduce the number 
of computations of exp(-), take the analogous ID Gaussian, 

e -a(x-x n ) 2 _ e -a(ih-x n ) 2 _ e -a((ih) 2 ^ihxn+x^ ) 

= e -a(ih)* ( e 2ah Xn y e -axl_ ^ 
(a) (b) (c) 

Note that the term (a) is independent of x n , so those M evaluations of exp(-) are done once, 
stored and reused for each of the ./V sources x n . The terms (6) and (c) each incur one exp(-) 
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for each x n . The same procedure is then applied for e~ a ^ y ~ Vn ^ 2 and ditto for z. For full 
algorithms, additional details and important remarks, we refer to [37]. The bottom line is 
that, rather than having to compute NP 3 exponentials, the gridding step requires P 3 + AN 
exponentials and 0(NP 3 ) multiplications. This translates to a significant performance gain 
in practice. 

3.4 Fast method for ip F > k=0 

Turning now to an efficient and accurate method for evaluating the singular part of the 
reciprocal space 2P Ewald sum: The computation of (15), 

9 N 

^k=0 = ^=0 {Zm) = - 2 -pJ2 «nf(Zm ~ Z n ) (31) 

L n=l 

f{z) = e-^ 2 /i + ^ze^z), (32) 

is much less complex than the fast computation of (13) - it is a finite sum over terms that only 
depend on z. The obvious approach to avoid 0(N 2 ) complexity is an appropriate interpolation 
method (sometimes imprecisely referred to as table lookup), and the natural choice is to use 
Chebyshev polynomials. This method is close to optimal in oo-norm and cheap to compute 
(even though there is no periodicity in tp F]l=0 (z)). More precisely, we have z n € [0,L Z ], and 
let Pk, k = 1, 2, . . . , Mt <C N, be the set Chebyshev- Gauss points cos(7r(2/c — 1)/ (2Mt)) scaled 
to the interval [0, L z \. We expand ip F > k=0 in terms of Chebyshev polynomials 

M T 

^=\z) « T(z) = J2 c 3 Tf^\z), ^=\ Pk ) = T( Pk ), 

3=1 

where T^ ,Lz \z) is the j:th Chebyshev polynomial scaled to the relevant interval. The coeffi- 
cients Cj are easily computed after evaluating (/9 F ' k=0 (pfc), and Mt is in essence an accuracy 
parameter - so the complexity of this task is 0{N). We are dealing with interpolation in 
ID, so the computational resources involved are entirely trivial 2 . Note that the well-known 
Clenshaw formula should, for reasons of numerical stability, be used when evaluating the 
orthogonal basis J2 c f^A z )- 

Very strong error bounds exist for Chebyshev interpolation (see e.g. the classical references 
Rivlin [47, 48]), such as 

e M ■= max \f(z)-T f (z)\ < 



ze[o,L z ] lJy ' Wl - 2«t- 1 M t !' 

for some c G [0,L 2 ], where T* is the interpolant of order Mt to / (32). It is evident that 
/ possesses Mt derivatives. However, each differentiation yields roughly a factor £ 2 , so the 
interpolation error will ultimately depend on £. This suggests an interpolation estimate of 
the form eM ~ (7£ Mt 2~( Mt ~ 1 ) /Mp!, but that turns out to be impractical and inaccurate due 
to the very large quantities involved. Instead, we find that 

eM «e 3 2-«, c = vrC 1/2 , (33) 



2 As a point of reference, with N = 10 6 particles and Mt = 40 Chebyshev polynomials it takes roughly one 
second to evaluate tp F ' k=0 (z m ) at all points z m , m = 1, . . . , 10 6 . 
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Figure 6: Error in oo-norm when computing ip F ^ with Chebyshev interpolation, with 
N = 10, together with "practical" error estimate (33). Left: £ = 5. Right: £ = 10. 

provides a useful form. Again, one may treat error estimation here with some laxity, as the 
performance penalties associated with being cautious (i.e. taking My needlessly large) are 
very small. We give brief numerical results in Figure 6. 

In their method, Kawata and Mikami [28] propose a similar approach based on B-splines. 
These, of course, have polynomial accuracy order. A small numerical experiment (omitted) 
indicates that the previous remarks about computational triviality then fail to apply (at least 
in the broad accuracy regime considered). 

3.5 Error analysis 

We now gather strands of numerical and theoretical results into an aggregate view of the nu- 
merical properties of the proposed method. This serves the dual purposes of putting previous 
statements of accuracy on secure theoretical foundations, and providing useful guidance for 
the often intricate task of parameter selection. As previously alluded to, we start from the 
classification of numerical errors into two categories: errors that stem from the underlying 
Ewald sum (4) and errors that stem from the fast method of the present section. 

3.5.1 Truncation estimates for Ewald sums 

Turning to the 2P Ewald sums for tp R (12) and <p F (14), we note that both sums are infinite 
but rapidly converging. The real space sum (12) is unchanged vis-a-vis 3P, and has been 
thoroughly studied in that context. The reader may already be familiar with the famous error 
estimates by Kolafa and Perram [34], which suggest that the truncation error committed by 
letting ||x|| < r c may be estimated by 



where Q := J2n=i Qn an d the RMS norm is defined as e rms := \J N^ 1 J2n=i(. ( Pn — y*(x n )) 2 . If 
particles are statistically correlated, i.e not randomly scattered, an oo-norm measure may be 
more appropriate, see e.g. Strain [51]. 




(34) 
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Figure 7: Left: Convergence of real-space sum (12) as function of truncation radius, ||x|| < r c , 
for (right to left) £ = 4, 6, 8, 10 and the error estimate (34) in dashed. Right: convergence of k- 
space sum (14) as a function of the truncation, ||k|| < 27rk OCl /L, for (left to right) £ = 4, 6, 8, 10 
and the error estimate (35) in dashed. 



Correspondingly, for the 3P k-space Ewald sum - truncated at finite number of modes, 
koo £ Z + , i.e. ||k|| < 2-irkoo/L - Kolafa & Perram [34] suggest that 

«&.(ftbo,0 - e-- 2 ^o 3/2 v^exp (- (|^) 2 ) . (35) 

The feasibility of (35) as a 2P estimate may come as a surprise, as it arose from analysis 
of the 3P sum. We contend that this is quite natural - roughly speaking, each dimension 
has to converge. Figure 7 shows numerically that (34) and (35) capture the behavior of the 
truncation error with striking agreement. 



3.5.2 Approximation errors 

The second family of errors are those that stem from the fast method, described in the 
preceding sections, notably the error due to the quadrature used to evaluate (21). An extensive 
treatment is given in [37], where we prove the following theorem: 

Theorem 3.4 (Error estimate). Given £ > 0, h > and an odd integer P > 0, letw = hP/2, 
and define r] according to (22). The error incurred in evaluating (21) by truncating the 
Gaussian at ||x — x m || = w and applying the trapezoidal rule T p can be estimated by 

\<P-T P \<C 2p2 /( 2m2 ) + erfc (m/v^)) • (36) 

From this we surmise an appealing choice of the shape parameter, m(P) ~ vttP, which 
then yields a quadrature error estimate 

E Q (P):=\<p F -T P \^Ce-* p '\ (37) 

to be verified in Section 4.1. Two other errors emerge, specific to the 2P method: First, 
there is an interpolation error from the fast method for ip F,]i=0 , as we investigated in Section 
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3.4. Secondly, there is the need to oversample the inverse transform when computing H(r, z), 
which we devoted Section 3.2.2 to. We view the oversampling guidelines from Figure 4 as 
generally applicable, and content ourselves with that. 

3.5.3 Choosing parameters 

There are several parameters present in all (fast) Ewald methods and they should be chosen 
with two goals in mind: balancing the work between the real- and k-space sums (by choosing 
£), and attaining a desired accuracy (selecting e.g. an appropriate PME grid M). The first 
concern is inherently implementation-dependent and work-balance will depend strongly on 
TV, the number of charges. Ipso facto, there can not exist an optimal parameter set of broad 
applicability, and there is no generally accepted tuning method. 

The second concern, assuring that the end-result satisfies a desired accuracy is also an open 
question (and likely to remain that way). Two approaches stand out in the literature: using 
an optimization technique, and relying on apriori error estimates. Among the advocates of 
"optimization" 3 are Kawata et. al. [32] and Ghasemi et. al. [18]. The former (cf. [32, Tab. 2], 
where eight parameters are determined) suggest a high degree of irregularity in the parameter 
set (which is either incorrect per se or grounds for concern over the numerics involved). In the 
latter work, "Pareto frontier optimization" is used (on a set of five parameters) for a specific 
crystalline system. In both cases, it is implied that a similar investigation should be performed 
whenever a new system is under consideration - but the optimization technique depends on a 
sufficiently accurate reference solution being computable by the underlying Ewald sum, which 
naturally restricts the method to small TV. 

On the other hand, relying on a priori error analysis alone has to confront a complicated 
mix of numerical errors. Whereas the picture is clear for pure 3P Ewald summation (2), as we 
discuss in Section 3.5.1, fast methods often pose stiff challenges to error estimation. In their 
survey of fast 3P methods, Deserno and Holm [11] sketch the parameter space and remind us 
that many numerical errors are interdependent. Methods for the 2P situation are less mature 
and fewer error estimates have been established. A notable exception is the MMM2D method 
by Holm et. al. [3, 2], which enjoys sharp error estimates that are suitable for parameter 
selection. In [8], they provide analysis for the case when the 3P Ewald sum is used for 2P 
systems, though results are absent for the PME-accelerated case. 

Our view is that error analysis should be the primary focus, but a certain amount of 
experimentation is a worthy complement. A particular goal is to have errors that decouple, 
so that parameters can be chosen in sequence and numerical experiments can treat one pa- 
rameter at a time. This is by no means simple - established PME methods for the 3P Ewald 
sum have approximation and truncation errors in a tangle after doing charge-assignment by 
e.g. B-splines, as we elaborate on in [37] - but in the present work decoupling is achieved. 
Furthermore, to be useful, error estimates need to be sharp and simple enough that they are 
"solvable" for a desired parameter. 

As a sequence of considerations we suggest 

1. Determine a truncation radius, r c , such that the real space sum is cheap to compute 
(cf. neighbor list methods [1] or as summarized in [37]). 

3 A more correct description of the parameter optimization problem may be "scanning", and should generally 
not be confused with numerical optimization techniques (such as gradient-based methods). 
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2. Select Ewald parameter, £, such that the real space sum has converged to within a given 
tolerance, e, at ||r|| < r c by invoking e.g. (34), 




(38) 



where W(-) is the Lambert W-function (also known as the product logarithm, it is among 
the "special functions" provided in e.g. Matlab and Mathematica, defined as the inverse 
of f(W) = We w [5]). 

3. Then determine the truncation, koo, of the k-space Ewald sum (4), such that the same 
tolerance is met, from (35): 



: ; ,S (39) 



27T \ ^37T 2 /3L2^/3 e 4/3 

This gives the grid size to be employed in the spectral PME method: M — 2/cqo. 

4. The quadrature error estimate (37), implies the number of points within the support of 
each Gaussian: 

p> _2Llog(£/C^ 

TT 

Note that one may get P > M (if £ small) , in which case one has to increase the grid 
size, i.e. M = max(P, 2k 00 ). The constant, C, here (from the error estimate (37)) does 
not depend on £, so P is perhaps most conveniently identified from a basic convergence 
test, e.g. Figure 8 where the estimate is plotted with C = 10. 

5. Select a oversampling factor, Sf, for the reciprocal space calculations of the fast method, 
as discussed in Section 3.2. 

6. Finally, determine the Chebyshev grid (cf. Section 3.4) for computing (f^T° (15). 

If this sequence of steps seems laborious, it might be worth pointing out that having 
a spectrally accurate method at hand makes it cheap to err on the side of caution - the 
parameter with the greatest impact on run-time is P. Again, it's a sequential process - 
rather than a non-linear optimization problem. 



4 Numerical Evaluation 

4.1 Accuracy of spectral PME method 

In Section 3.5 we showed that convergence of the order e~ aP is to be expected (we have taken 
m = 0.91\/ttP). To test this we consider small systems, so that the 2P Ewald sum (4) can be 
accurately computed as a reference, denoted (p*, and measure errors in the following norm: 

1 N 

e: =iv ^|/(x m )-^)|. (40) 

m=l 
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Figure 8: Convergence of SE2P method to Ewald sum, in norm (40), for various oversampling 
factors, with quadrature error estimate (37) as dashed line. The computational domain is 
[0, l) 2 x [-1/2,3/2], so M z = 2s f M. Left: £ = 4, M = 21. Right: £ = 12, M = 50. 

We draw x m G [0, l) 3 from a uniform random distribution, randomize charges under the 
constraints that Y q m = and Y \q m \ = 1. The minimal computational domain is [0, l) 2 x 
[—w, 1 + w], but as w = w(P) we take [0, l) 2 x [—1/2, 3/2] to avoid having a PME-grid that 
depends on P. We take N = 50, consider two cases: £ = 4, M = 21 and £ = 12, M = 50. The 
convergence results in Figure 8 support several conclusions: (i) spectral accuracy as predicted 
by theory; {ii) convergence rate independent on Ewald parameter £; (iii) oversampling the 
grid in the z-direction by a factor three (or six as L z = 2L XtV , i.e. M z = 2s fM = 6M, 
depending on how you look at it) is sufficient to get double precision accuracy. 

4.2 Computational overview 

To give a sense of the practical characteristics of our method, we give a brief overview of 
the run-time profile with our implementation. We have previously discussed the need to 
oversample the Fourier transform in the z-direction, and the question then naturally arises if 
this incurs a significant cost. We note that the (x, y)-grid isiWxM and, by the error analysis 
presented, M rarely needs to be bigger than 50. Hence, even in the case where we oversample 
by a factor 6 - to safely commit an quadrature error in the Fourier integral transform on the 
order of machine accuracy, cf Section 3.2 - the total grid size is 6 x 50 3 = 750000 elements 
(stored in about 6MB). In the works cited, grid sizes of up to 512 3 are mentioned (at much 
more modest accuracies). 

As expected then, the computational burden in our method falls on the gridding steps 
(17) and (21). We illustrate this for two systems in Figure 9. Here we let £ = 8 and target an 
accuracy e ~ 10 -10 (see caption for further details). These are single-core results obtained on 
an ordinary workstation computer (Intel Core2Duo E6600), implementation in C. In Table 1 
we give performance numbers for the gridding step in terms of the support P. 
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Figure 9: Runtime profile, i.e. time spent in different parts of fast algorithm, where "grid" 
refers to (17), "Poisson" refers to (19) and "int" refers to (21). Left: N = 10000, M = 20, 
P = 15. Right: N = 10 5 , M = 40, P = 17. In both cases FFT oversampled by factor 
six in ^-direction, M z = 6M. Despite that, the transforms take a trivial amount of time to 
compute. 
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Table 1: Time for gridding (in microseconds per particle) for different support P. 
4.3 Scaling to large systems 

Computing the real-space sum (12) has been ignored up to this point, save for the remarks on 
parameter selection of Section 3.5.3. In this regard we follow the conventional line of thought 
- computing (12) has 0{N) complexity iff each particle interacts with a fixed number of 
neighbors that lie within a radius r c , as ./V grows. This implies either (i) that the domain grows 
(so that N/L 3 constant, and r c constant), or (ii) that the interaction radius, r c , decreases, as 
N grows. Regardless, the grid size, M, will grow. 

Following the second approach, we return to the parameter estimates in Section 3.5.3 and 
note, by (38), that £ grows as r c becomes smaller. Consequently, and by (39), the grid size M 
will grow with N. Hence, the complexity of our proposed spectrally accurate PME method 
is O(NlogN). Note that before the O(N) complexity of the real space sum is imposed, the 
calculations (17) - (21) have complexity 0(NP 3 ) + C(M 3 logM 3 ) = 0(N). As we suggest 
in the previous section, the constant in front of the first term is quite a bit bigger than the 
constant in the FFT part. Thus, the logiV factor is not seen in practice. 

To verify this, and clarify the parameter selection process (cf. Section 3.5.3), we give 
scaling results (measured run-time to compute ip F ) in Figure 10, including the parameter 
table (right). Here, the target accuracy is e = 10 -9 , we start with N = 1000 and let P = 15, 
and invoke the estimates as described. 

5 Summary and concluding remarks 

In our survey of methods to compute the sum of Coulomb potentials (1) we argue that methods 
for the 2P case are less mature and consistent than their 3P cousins. Hence, the desire from 
the applications community for an established tool for 2P electrostatic calculations is to some 
extent unsatisfied. 

We aim to close the gap between 2P and 3P Ewald methods by two provisions. First, we 
derive the 2P Ewald sum (4) using the established methodology of screening functions that 
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Figure 10: Left: Run-time as a function of N, with e = 10 . Right: Parameters to scale up 
system at constant cost for real-space sum 

follows the 3P case closely (Section 2.2). Secondly, we derive a fast PME-type method for 
the 2P k-space sum that fits well in the established PME framework (Section 3.1), which we 
refer to as SE2P. These derivations were made possible by representing functions on mixed 
periodicity (x, y periodic, z "free") in frequency domain via a mixed Fourier transform (5), 
see Section 2.1. This point of view is natural and clarifies the relationship between 2P and 
3P Ewald methods to an extent that we do not believe has been previously reported. 

In light of this, we conclude that a fast PME-type method for 2P will have to compute a 
mixed transform (a discrete Fourier transform in the periodic variables, and an approximation 
to the continuous Fourier integral transform in the free dimension). Efficiency constraints 
suggest that the quadrature for the Fourier integral should be based on the FFT. We studied 
this problem in Section 3.2, and point out that the accuracy of an FFT-based quadrature 
method will depend on the regularity of the integrand. 

Hence, a method using the SPME approach (using Cardinal B-splines to represent regu- 
larized charges on the grid) will have to deal with vastly reduced accuracy in the quadrature 
step of the Fourier transform vis-a-vis the approach that we suggest, which uses Gaussians 
that are C°° smooth. Whereas established PME methods may be seen as adequate, in terms 
of accuracy, for the 3P case, our analysis suggest that that may not be true in 2P (Section 
3.2.4). The SE2P method is similar in structure to the work by Kawata et. al. [28], though 
it appears to offer significant advantages. 

The method we propose (Sections 3.1 to 3.3) for computing the 2P k-space Ewald sum is 
spectrally accurate, meaning that all errors decay exponentially (as we establish theoretically 
in Section 3.5 and verify numerically in Section 4.1). More specifically, the numerical errors 
present stem from two sources: truncation of the underlying Ewald sum (Section 3.5.1) and 
approximation errors introduced by the fast method (Section 3.5.2). A well established error 
estimate for the former is used to determine the appropriate grid size, M. Our error analysis 
of the latter is used to determine the number of points within the support of our Gaussians, P. 
To our knowledge, this is the only fast Ewald summation method that is spectrally accurate 
and the only one that retains a decoupling of errors as discussed here - a fortiori in 2P. 

Moreover, the proposed method is efficient, capable of dealing with N ~ 10 6 in a few 
seconds (Section 4.2). In particular, we see that the grid sizes needed are very small - so 
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small that the Fourier transforms are cheap to compute, even when allowing for oversampling 
the ^-dimension. The computational burden falls more heavily on the gridding steps (17) and 
(21). The Fast Gaussian Gridding approach (Section (3.3)) alleviates this to a large extent. 

We believe that these properties - accuracy, clear parameter selection, efficiency, and 
closeness to 3P methods - present a compelling case for the proposed method for electrostatic 
calculations in planar periodicity. 
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A Derivation of 2P Ewald sum, details 

The derivation of ip F,k=0 (z), the singularity contribution (15), is given here for completeness 
and because it's illuminating in it's own right. 

To make this clear, we first disregard the screening, 7, i.e. consider 

N 

-Ap(x) =4ttJ2 /° n ( x )> P n ( x ) = E 9n<5(x - x n + p), 



11=1 

under an assumption of charge neutrality, J2n=i In = 0> an d the condition 



2vr N 

i^^^s^ (41) 

n=l 



Provisionally, as in Section 2.2, 

The terms corresponding to k / are uncomplicated and can be integrated, 

N 



*) : = ^ E / E Qn^-^e^'^dn 

9-7T N 1 

= ^ E E Sn^e-INI— l e *.(r- r „)_ (42) 
L k^0n=l H K H 

Note that lim 2 _ > ± 00 (p = 0. Hence, we seek a term ip°(z) with the desired behavior (41) at 
z — > ±00. Then, <^(x) = <£(x) + ip°(z) will be a unique and well defined solution to the 
2-periodic Poisson problem under consideration. 

If we disregard the boundary condition (41), it's evident that 92 is only determined up to 
a piecewise linear function. Consider the adding the term <p°(z) = bJ2n=i Qn\z — z n \. Using 
charge neutrality, one finds that 

N 

lim tp°(z) = TbJ^qnZn- 

x-^±co * — ' 
n=l 
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This solution, with b = —j^, can be found e.g. by considering the one-dimensional Green's 
function for the k = mode, see Genovese et. al. [17]. With this, 

9^r N 1 9-7T N 

4EE ft»|iUj| e " l|k|l|j( "*" ' eik ' (r_rn) - Ti E 1n\z ~ Z n \ 
L k^0n=l l|K|1 L n=l 

satisfies (41). 

There's a natural correspondence between (p and ip F (14). However, the crucial point 
when the decomposition (6) enters is that some of the k = mode will be included into the 
real-space sum. Therefore, rather than taking (p F ' k=0 equal to p°, we subtract the real-space 
term (which is most accessible as the difference between ip and <p F ), 

^=\z) = <p°(z) - lim - f F k ) . 

We introduce 

— N 
lim (ip F k - = ^2 E ^ a ( z - 

n=l 

and compute 

a(z) = lim-( e fc2 erfc f ^ + £z) + e- fc2 erfc f — - £z) - 2e~ k ^ 
k^ok V \2£ / \2£ / 



2 /l 



f^e-« 222 + v^(-|z|+zerf(^)) 



Finally, 



it N 2tt N 

n=l n=l 

2Ji" /l 



E *» (V^*"*") 2 + y^(^ - z»)erf(£(* - z n ))) , 

n=l ^ ' 

as we set out to show. Using charge neutrality, the limits (41) can be verified. 
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